What Is Claimed Is: 



1 1 . A method for detecting a thermal anomaly in a computer system, 

2 comprising: 

3 deriving an estimated signal for a thermal sensor in the computer system, 

4 wherein the estimated signal is derived from correlations with other 

5 instrumentation signals in the computer system; 

6 comparing an actual signal from the thermal sensor with the estimated 

7 signal to determine whether a thermal anomaly exists in the computer system; and 

8 if a thermal anomaly exists, generating an alarm. 

1 2. The method of claim 1 , wherein generating the alarm involves 

2 communicating the alarm to a system administrator so that the system 

3 administrator can take remedial action. 

1 3. The method of claim 2, wherein communicating the alarm to the 

2 system administrator involves communicating information specifying the nature 

3 of the thermal anomaly to the system administrator. 

1 4. The method of claim 1 , wherein comparing the actual signal with 

2 the estimated signal involves using sequential detection methods to detect changes 

3 in the relationship between the actual signal and the estimated signal. 

1 5. The method of claim 4, wherein the sequential detection methods 

2 include the Sequential Probability Ratio Test (SPRT). 
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1 6. The method of claim 1 , wherein prior to deriving the estimated 

2 signal, the method further comprises determining correlations between 

3 instrumentation signals in the computer system, whereby the correlations can 

4 subsequently be used to generate estimated signals for thermal sensors. 

1 7. The method of claim 6, wherein determining the correlations 

2 involves using a non-linear, non-parametric regression technique to determine the 

3 correlations. 

1 8. The method of claim 7, wherein the non-linear, non-parametric 

2 regression technique can include a multivariate state estimation technique. 

1 9. The method of claim 1 , wherein the instrumentation signals can 

2 include: 

3 signals associated with internal performance parameters maintained by 

4 software within the computer system; 

5 signals associated with physical performance parameters measured 

6 through sensors within the computer system; and 

7 signals associated with canary performance parameters for synthetic user 

8 transactions, which are periodically generated for the purpose of measuring 

9 quality of service from and end user's perspective. 

1 10. The method of claim 1 , 

2 wherein deriving the estimated signal for the thermal sensor involves 

3 deriving multiple estimated signals for multiple thermal sensors in the computer 

4 system; and 
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5 wherein comparing the actual signal with the estimated signal involves 

6 comparing multiple actual signals with the multiple estimated signals to determine 

7 whether a thermal anomaly exists in the computer system. 

1 1 1 . A computer-readable storage medium storing instructions that 

2 when executed by a computer cause the computer to perform a method for 

3 detecting a thermal anomaly in a computer system, the method comprising: 

4 deriving an estimated signal for a thermal sensor in the computer system, 

5 wherein the estimated signal is derived from correlations with other 

6 instrumentation signals in the computer system; 

7 comparing an actual signal from the thermal sensor with the estimated 

8 signal to determine whether a thermal anomaly exists in the computer system; and 

9 if a thermal anomaly exists, generating an alarm. 

1 12. The computer-readable storage medium of claim 1 1 , wherein 

2 generating the alarm involves communicating the alarm to a system administrator 

3 so that the system administrator can take remedial action. 

1 13. The computer-readable storage medium of claim 1 2, wherein 

2 communicating the alarm to the system administrator involves communicating 

3 information specifying the nature of the thermal anomaly to the system 

4 administrator. 

1 14. The computer-readable storage medium of claim 1 1 , wherein 

2 comparing the actual signal with the estimated signal involves using sequential 

3 detection methods to detect changes in the relationship between the actual signal 

4 and the estimated signal. 
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15. The computer-readable storage medium of claim 14, wherein the 
sequential detection methods include the Sequential Probability Ratio Test 
(SPRT). 



1 1 6. The computer-readable storage medium of claim 1 1 , wherein prior 

2 to deriving the estimated signal, the method further comprises determining 

3 correlations between instrumentation signals in the computer system, whereby the 

4 correlations can subsequently be used to generate estimated signals. 

1 17. The computer-readable storage medium of claim 16, wherein 

2 determining the correlations involves using a non-linear, non-parametric 

3 regression technique to determine the correlations. 

1 18. The computer-readable storage medium of claim 1 7, wherein the 

2 non-linear, non-parametric regression technique can include a multivariate state 

3 estimation technique. 

1 1 9. The computer-readable storage medium of claim 1 1 , wherein the 

2 instrumentation signals can include: 

3 signals associated with internal performance parameters maintained by 

4 software within the computer system; 

5 signals associated with physical performance parameters measured 

6 through sensors within the computer system; and 

7 signals associated with canary performance parameters for synthetic user 

8 transactions, which are periodically generated for the purpose of measuring 

9 quality of service from and end user's perspective. 
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1 20. The computer-readable storage medium of claim 1 1 , 

2 wherein deriving the estimated signal for the thermal sensor involves 

3 deriving multiple estimated signals for multiple thermal sensors in the computer 

4 system; and 

5 wherein comparing the actual signal with the estimated signal involves 

6 comparing multiple actual signals with the multiple estimated signals to determine 

7 whether a thermal anomaly exists in the computer system. 

1 2 1 . An apparatus that detects a thermal anomaly in a computer system, 

2 comprising: 

3 an estimation mechanism configured to derive an estimated signal for a 

4 thermal sensor in the computer system, wherein the estimated signal is derived 

5 from correlations with other instrumentation signals in the computer system; 

6 a comparison mechanism configured to compare an actual signal from the 

7 thermal sensor with the estimated signal to determine whether a thermal anomaly 

8 exists in the computer system; and 

9 an alarm generation mechanism, wherein if a thermal anomaly exists, the 
1 0 alarm generation mechanism is configured to generate an alarm. 

1 22. The apparatus of claim 21, wherein the alarm generation 

2 mechanism is configured to communicate the alarm to a system administrator so 

3 that the system administrator can take remedial action. 

1 23. The apparatus of claim 22, wherein the alarm generation 

2 mechanism is configured to communicate information specifying the nature of the 

3 thermal anomaly to the system administrator. 
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24. The apparatus of claim 2 1 , wherein the comparison mechanism is 
configured to use sequential detection methods to detect changes in the 
relationship between the actual signal and the estimated signal. 



1 25. The apparatus of claim 24, wherein the sequential detection 

2 methods include the Sequential Probability Ratio Test (SPRT). 

1 26. The apparatus of claim 2 1 , further comprising a correlation 

2 determination mechanism configured to determine correlations between 

3 instrumentation signals in the computer system, whereby the correlations can 

4 subsequently be used to generate estimated signals. 

1 27. The apparatus of claim 26, wherein the correlation determination 

2 mechanism is configured to use a non-linear, non-parametric regression technique 

3 to determine the correlations. 

1 28. The apparatus of claim 27, wherein the non-linear, non-parametric 

2 regression technique can include a multivariate state estimation technique. 

1 29. The apparatus of claim 2 1 , wherein the instrumentation signals can 

2 include: 

3 signals associated with internal performance parameters maintained by 

4 software within the computer system; 

5 signals associated with physical performance parameters measured 

6 through sensors within the computer system; and 

18 

Attorney Docket No. SUN-P8737-SPL Inventors: Gross et al. 

ARP E:\SUN MICROSYSTEMS\SUN-P8737-SPL\SUN-P8737-SPL APPLICATI0N.DOC 



7 signals associated with canary performance parameters for synthetic user 

8 transactions, which are periodically generated for the purpose of measuring 

9 quality of service from and end user's perspective. 

1 30. The apparatus of claim 2 1 , 

2 wherein the estimation mechanism is configured to derive estimated 

3 signals for multiple thermal sensors in the computer system; and 

4 wherein the comparison mechanism is configured to compare multiple 

5 actual signals with the multiple estimated signals to determine whether a thermal 

6 anomaly exists in the computer system. 
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